Heterogeneity in a relationship between two variables means that their relationship “varies” with respect to some other variable(s). For our purposes, this can mean that the relationship varies in time (temporal heterogeneity) or it varies across locations (spatial heterogeneity).
Aaron first discovered heterogeneity in the doctor visits indicator, which inspired this DAP. The idea of sensorization, which we apply in this DAP to attempt to account for heterogeneity, has a long history in Delphi; see David’s thesis and Maria’s paper.
In this notebook, we perform a deep investigation of one particular sensorization procedure: a moving window regression, where we fit using data from 10 weeks back to 1 week back. We find that dynamic sensorization is able to correct spatial heterogeneity (improve geo-wise correlations), with neutral effects on time-wise correlations and ability to forecast future cases/deaths. Validation setups suggest that it may be possible that we are overfitting to our targets. I would be far more confident geo-wise correlation results if we get some measures of spread (e.g., bootstrap confidence intervals).
This is measured through the “simple forecaster” setup that Ryan has built.
First, we fix notation. Assume an indicator and target (e.g., doctors visits and case rate), which we suppress notationally for brevity. Each observation is then represented as \((x_{t\ell}, y_{t\ell})\), where \(x\) is the indicator value, \(y\) is the target value, \(t\) represents time (measured in dates), and \(\ell\) represents location. Let \(L\) denote the set of all valid locations, e.g., all counties. Let \(x_{t\cdot}\) denote all the \(x_{t\ell}\) collected across locations in \(L\), and similarly for \(y_{t\cdot}\). Let \(x_{t_1:t_2, \ell}\), \(y_{t_1:t_2, \ell}\) denote the observations that fall within times \(t_1, t_2\), endpoints included. Finally, let \(x, y\) be the collection of all observations across time and location.
This is just the unsensorized indicator itself.
In the basic, spatial-only form of sensorization (as computed in Aaron’s notebook), we ignore the possibility of temporal heterogeneity and learn a single linear relationship between the indicator and target for each location. Specifically, we learn, for each \(\ell \in L\) \[ y_{\cdot\ell} \sim x_{\cdot\ell} \qquad\Rightarrow \mathrm{Model}(\ell) \] and obtain the sensorized indicator values \[ \tilde x_{t\ell} = \texttt{predict}(x_{t\ell}, \mathrm{Model}(\ell)) = \hat y_{t\ell}. \] Originally computed merely as a point of comparison, we are now seriously considering some form of “static” sensorization to avoid overfitting; more details are given in the sequel. It is important to note that although static sensorization does not allow the model to vary with time, the version considered in this notebook has the important advantage that sensorized estimates are produced in-sample.
Let \(k_1, k_2\) denote the number of days into the past we wish to examine data when fitting our model. (Smaller \(k_1-k_2\) is “more adaptive” to temporal heterogeneity, but may also lead to less stable fits).
We fit, for each \(t, \ell\): \[ y_{(t-k):(t-1), \ell} \sim x_{(t-k_1):(t-k_2), \ell} \qquad\Rightarrow \mathrm{Model}(\ell, t, k_1, k_2) \] and obtain the sensorized indicator values \[ \tilde x_{t\ell} = \texttt{predict}(x_{t\ell}, \mathrm{Model}(\ell, t, k_1, k_2)). \] We fit a linear model for each location and day, and then take the prediction for that day.
For our purposes we take \(k_1, k_2\) to be \(-70, -8\). That is, we train over a nine-week window, without the immediately preceding week of data.
Overall, we find that geo-wise correlations improve; and there is neutral impact on time-wise correlations and usefulness in simple forecasting.
For each of our sensorized indicators, we compute its geo-wise rank correlation against the target data. Assuming that the target data observes the proper “ordering” for each day, an improvement in the geo-wise rank correlations in the sensorized indicators would mean that for a fixed day, the (sensorized) indicator’s values are (more) comparable across location. This is important, for example, if we want the indicator to be intuitively useful on a map.
We generally find that the geo-wise correlations are improved by sensorization, both dynamic and static. This is encouraging. It is noteworthy that dynamic sensorization does essentially as well as static sensorization here, since this static sensorization has the added advantage that estimates are being produced in-sample.
sensorize_time_ranges = list(
c(-42, -8),
c(-49, -8),
c(-56, -8),
c(-63, -8),
c(-70, -8)
)
splot_idx = 5
df_cor_list = vector('list', length(source_names))
for (ind_idx in 1:length(source_names)) {
base_cor_fname = sprintf('results/12_base_cors_%s_%s_%s_%s.RDS',
geo_level,
source_names[ind_idx], signal_names[ind_idx],
target_names[ind_idx])
df_cor_base = readRDS(base_cor_fname)
sensorize_fname = sprintf('results/12_sensorize_cors_%s_%s_%s_%s.RDS',
geo_level,
source_names[ind_idx], signal_names[ind_idx],
target_names[ind_idx])
sensorize_val_fname = sprintf('results/12_sensorize_vals_%s_%s_%s_%s.RDS',
geo_level,
source_names[ind_idx], signal_names[ind_idx],
target_names[ind_idx])
sensorize_cors = readRDS(sensorize_fname)[[splot_idx]]
df_cor = bind_rows(df_cor_base, sensorize_cors)
static_description = "Static Sensorization (Spatial)"
dynamic_description = "Dynamic Sensorization (-70:-8)"
df_cor$Indicator = sapply(df_cor$Indicator, function(x) {
if (x == 'Sensorized (Spatial)') {
return(static_description)
} else if (x == sprintf('Sensorized (TS, %d:%d)',
sensorize_time_ranges[[splot_idx]][1],
sensorize_time_ranges[[splot_idx]][2])) {
return(dynamic_description)
} else { return(x) }})
df_cor$Indicator = factor(df_cor$Indicator,
levels=c('Raw',
static_description,
dynamic_description))
df_cor$sensor_target = sensor_target_names[ind_idx]
df_cor_list[[ind_idx]] = df_cor
}
df_cor = bind_rows(df_cor_list)
df_cor$sensor_target = factor(
df_cor$sensor_target,
levels=sensor_target_names)
plt = ggplot(
df_cor,
aes(x = time_value,
y = value,
color=Indicator)
) + geom_line(
) + facet_wrap (
vars(sensor_target),
ncol=2,
) + theme_bw (
) + theme(
legend.position = "bottom",
legend.title = element_blank()
) + labs (
x = 'Date',
y = 'Rank correlation (geo-wise)'
)
print(plt)
For each of our sensorized indicators, we compute time-wise correlations over a moving window of time 6 weeks wide. Specifically, we iterate across time, and on each date and for each location, we consider take the sensorized values (raw, static, and dynamic) from the preceding 42 days and compute the time-wise correlation with the corresponding target data. We then reduce the location component out of this data by taking the median (and mad) of the correlations for each day.
Previously, Aaron found (communicated on Slack) that dynamic sensorization appeared to degrade the time-wise correlation of Doctor visits against cases. I reproduced this apparent result, but also plotted the spreads of the time-wise correlations. The distributions of the time-wise correlations are overlapping, so I am not as concerned. I believe that the most we can say is that dynamic sensorization does not provide any added advantage in time-wise correlations, compared to static sensorization or the raw indicator.
timewise_cors_list = vector('list', length(source_names))
for (ind_idx in 1:length(source_names)) {
timewise_cor_fname = sprintf('results/13_timewise_cors_%s_%s_%s_%s_-41_0.RDS',
geo_level, source_names[ind_idx],
signal_names[ind_idx], target_names[ind_idx])
timewise_cor_list = readRDS(timewise_cor_fname)
timewise_cors = timewise_cor_list[[1]]
timewise_cors_summarized = timewise_cor_list[[2]]
min_dynamic_correlate = timewise_cors %>% filter (
sensorization == 'dynamic',
) %>% pull (
correlate_date,
) %>% min
timewise_cors_summarized = timewise_cors_summarized %>% filter (
sensorization %in% c('raw', 'static') |
correlate_date >= min_dynamic_correlate+4,
)
timewise_cors_summarized$sensor_target = sensor_target_names[ind_idx]
timewise_cors_list[[ind_idx]] = timewise_cors_summarized
}
timewise_cors_summarized = bind_rows(timewise_cors_list)
timewise_cors_summarized$sensor_target = factor(
timewise_cors_summarized$sensor_target,
levels = sensor_target_names)
timewise_cors_summarized$sensorization = factor(
timewise_cors_summarized$sensorization,
levels=c('raw', 'static', 'dynamic'))
plt = ggplot(
timewise_cors_summarized,
aes(x=correlate_date,
colour=sensorization),
) + geom_line (
aes(y=med,
linetype='median'),
alpha=0.8,
) + geom_line (
aes(y=med+mad,
linetype='med±mad'),
) + geom_line (
aes(y=med-mad,
linetype='med±mad'),
) + scale_linetype_manual(
values=c("median"="solid",
"med±mad"="dashed"),
# "min/max"="dotted"),
breaks=c('median',
'med±mad'),
# 'min/max'),
) + geom_hline(
yintercept = 0,
linetype = 3,
color = "gray"
) + facet_wrap (
vars(sensor_target),
ncol=2,
scale='free_y',
) + theme_bw (
) + theme(
legend.position = "bottom",
legend.title = element_blank()
) + labs (
x='Date',
y='Rank correlation (time-wise, over trailing 6 weeks)'
) + ggtitle (
'Timewise correlations'
)
print(plt)
As an additional sanity check, I computed the pointwise differences in time-wise correlation within each location, and then took the median (in essense, we interchange the difference and median operations). My concern was basically that the spreads of the time-wise correlations may be large across locations, but perhaps the dynamic sensorization fares consistently worse than the raw indicator by a small amount.
What we find is more peculiar. We find that the median pointwise differences between time-wise rank correlations of the sensorized and the raw indicator actually are positive. But again, the spreads are large.
timewise_cors_list = vector('list', length(source_names))
for (ind_idx in 1:length(source_names)) {
timewise_cor_fname = sprintf('results/13_timewise_cors_%s_%s_%s_%s_-41_0.RDS',
geo_level, source_names[ind_idx],
signal_names[ind_idx], target_names[ind_idx])
timewise_cor_list = readRDS(timewise_cor_fname)
timewise_cors = timewise_cor_list[[1]]
min_dynamic_correlate = timewise_cors %>% filter (
sensorization == 'dynamic',
) %>% pull (
correlate_date,
) %>% min
timewise_cors_summarized = timewise_cors %>% filter (
sensorization %in% c('raw', 'static') |
correlate_date >= min_dynamic_correlate+4,
) %>% pivot_wider (
names_from = sensorization,
values_from = value,
) %>% mutate (
raw = raw - raw,
static = static - raw,
dynamic = dynamic - raw,
) %>% pivot_longer (
cols = c(raw, static, dynamic),
names_to = 'sensorization',
values_to = 'value',
) %>% group_by (
correlate_date,
sensorization,
) %>% summarize (
med=median(value, na.rm=TRUE),
mad=mad(value, na.rm=TRUE),
min=min(value, na.rm=TRUE),
max=max(value, na.rm=TRUE),
) %>% ungroup
timewise_cors_summarized$sensor_target = sensor_target_names[ind_idx]
timewise_cors_list[[ind_idx]] = timewise_cors_summarized
}
timewise_cors_summarized = bind_rows(timewise_cors_list)
timewise_cors_summarized$sensor_target = factor(
timewise_cors_summarized$sensor_target,
levels = sensor_target_names)
timewise_cors_summarized$sensorization = factor(
sapply(timewise_cors_summarized$sensorization,
function(x) {sprintf('(%s-raw)', x)}),
levels=c('(raw-raw)', '(static-raw)', '(dynamic-raw)'))
plt = ggplot(
timewise_cors_summarized,
aes(x=correlate_date,
colour=sensorization),
) + geom_line (
aes(y=med,
linetype='median'),
) + geom_line (
aes(y=med+mad,
linetype='med±mad'),
) + geom_line (
aes(y=med-mad,
linetype='med±mad'),
) + scale_linetype_manual(
values=c("median"="solid",
"med±mad"="dashed"),
# "min/max"="dotted"),
breaks=c('median',
'med±mad'),
# 'min/max'),
) + facet_wrap (
vars(sensor_target),
ncol=2,
) + theme_bw (
) + theme(
legend.position = "bottom",
legend.title = element_blank()
) + labs (
x='Date',
y='Change in rank correlation (time-wise, over trailing 6 weeks)'
) + ggtitle (
'Change in timewise correlations (compared to raw indicator)'
)
print(plt)
For each (indicator, target) pair, we perform the static and dynamic sensorizations, and then evaluate its ability to “improve upon” the predictive ability of the target. The setup is the same as in Ryan’s forecasting notebook. We perform 7 day ahead and 14 day ahead forecasts. The base model only uses the target information (cases or deaths), at lags of 7 days behind and 14 days behind. The other three models add the raw indicator, static sensorized indicator, and dynamic sensorized indicator to the target (all three at the corresponding lags).
The four models (for each (indicator, target) pair) are evaluated in terms of scaled error. This normalizes the error by the error of the strawman, which propogates the previous week forward. Scaled errors below 1 are desirable.
We find, on the whole, that including the sensorized indicator neither helps nor hurts in the prediction task. This is discouraging for us; we might hope that sensorization, when performed properly, would provide “additional information” beyond the target.
As an aside, it is interesting to note that the scaled errors get smaller, and the spread of their distributions narrower, in November. I suspect this is because we have seen a continual surge in cases since October, and the consistent upward trajectory makes the prediction task “easier”.
sensor_target_names = sapply(1:5, function(ind_idx) {
sprintf('%s\nTarget: %s', pretty_names[ind_idx], target_names[ind_idx])
})
model_names = c('Targets',
'Targets+Raw',
'Targets+Static',
'Targets+Dynamic')
lags = 1:2 * -7
leads = 1:2 * 7
plot_df_list = vector('list', length(source_names))
for (ind_idx in 1:length(source_names)) {
predictive_fname = sprintf('results/14_predictive_%s_%s_%s_%s.RDS', geo_level,
source_names[ind_idx], signal_names[ind_idx],
target_names[ind_idx])
res = readRDS(predictive_fname)
# Restrict to common period for all 4 models, then calculate the scaled errors
# for each model, that is, the error relative to the strawman's error
res_all4 = res %>%
#res_all4 = res %>% filter(geo_value %in% geo_values) %>%
drop_na() %>% # Restrict to common time
mutate(err1 = err1 / err0, err2 = err2 / err0, # Compute relative error
err3 = err3 / err0, err4 = err4 / err0) %>% # to strawman model
mutate(dif12 = err1 - err2, dif13 = err1 - err3, # Compute differences
dif14 = err1 - err4) %>% # relative to cases model
ungroup() %>%
select(-err0)
# Calculate and print median errors, for all 4 models, and just 7 days ahead
if (ind_idx > 1) {res_all4 = rename(res_all4, lead=lead_)}
res_err4 = res_all4 %>%
select(-starts_with("dif")) %>%
pivot_longer(names_to = "model", values_to = "err",
cols = -c(geo_value, time_value, lead)) %>%
mutate(lead = factor(lead, labels = paste(leads, "days ahead")),
model = factor(model, labels = model_names))
plot_df = res_err4 %>% group_by(
model, lead, time_value
) %>% summarize(
med = median(err),
mad = mad(err),
min = min(err),
max = max(err),
) %>% ungroup()
plot_df$sensor_target = sensor_target_names[ind_idx]
plot_df_list[[ind_idx]] = plot_df
}
plot_df = bind_rows(plot_df_list)
plt = ggplot(
plot_df,
aes(x=time_value,
colour=model)
) + geom_line (
aes(y=med,
linetype='median'),
) + geom_line (
aes(y=med+mad,
linetype='med±mad'),
) + geom_line (
aes(y=med-mad,
linetype='med±mad'),
) + scale_linetype_manual(
values=c("median"="solid",
"med±mad"="dashed"),
breaks=c('median', "med±mad")
) + scale_color_manual(
values = c("black",
'#F8766D',
'#00BA38',
'#619CFF')
) + geom_hline(
yintercept = 1,
linetype = 2,
color = "gray"
) + facet_grid(
cols=vars(lead),
rows=vars(sensor_target),
scales='free',
) + ylim (
0, 2
) + labs(
x = "Date",
y = "Scaled error (relative to strawman)"
) + theme_bw (
) + theme(
legend.pos = "bottom",
legend.title = element_blank()
)
print(plt)
Because the models all performed the same in the previous setup, we conclude that the sensorized indicators “help” the base model. But how do the sensorized indicators compare against each other? We perform the predictive task where we train each model with only the sensorized indicator (without Cases/Deaths as a base covariate). Any differences between their performance are overwhelmed by the size of the spread.
sensor_target_names = sapply(1:5, function(ind_idx) {
sprintf('%s\nTarget: %s', pretty_names[ind_idx], target_names[ind_idx])
})
model_names = c('Targets',
'Raw',
'Static',
'Dynamic')
lags = 1:2 * -7
leads = 1:2 * 7
plot_df_list = vector('list', length(source_names))
for (ind_idx in 1:length(source_names)) {
predictive_fname = sprintf('results/17_predictive_%s_%s_%s_%s.RDS', geo_level,
source_names[ind_idx], signal_names[ind_idx],
target_names[ind_idx])
res = readRDS(predictive_fname)
# Restrict to common period for all 4 models, then calculate the scaled errors
# for each model, that is, the error relative to the strawman's error
res_all4 = res %>%
#res_all4 = res %>% filter(geo_value %in% geo_values) %>%
drop_na() %>% # Restrict to common time
mutate(err1 = err1 / err0, err2 = err2 / err0, # Compute relative error
err3 = err3 / err0, err4 = err4 / err0) %>% # to strawman model
mutate(dif12 = err1 - err2, dif13 = err1 - err3, # Compute differences
dif14 = err1 - err4) %>% # relative to cases model
ungroup() %>%
select(-err0)
# Calculate and print median errors, for all 4 models, and just 7 days ahead
res_err4 = res_all4 %>%
select(-starts_with("dif")) %>%
pivot_longer(names_to = "model", values_to = "err",
cols = -c(geo_value, time_value, lead)) %>%
mutate(lead = factor(lead, labels = paste(leads, "days ahead")),
model = factor(model, labels = model_names))
plot_df = res_err4 %>% group_by(
model, lead, time_value
) %>% summarize(
med = median(err),
mad = mad(err),
min = min(err),
max = max(err),
) %>% ungroup()
plot_df$sensor_target = sensor_target_names[ind_idx]
plot_df_list[[ind_idx]] = plot_df
}
plot_df = bind_rows(plot_df_list)
plt = ggplot(
plot_df %>% filter (model != 'Targets'),
aes(x=time_value,
colour=model)
) + geom_line (
aes(y=med,
linetype='median'),
) + geom_line (
aes(y=med+mad,
linetype='med±mad'),
) + geom_line (
aes(y=med-mad,
linetype='med±mad'),
) + scale_linetype_manual(
values=c("median"="solid",
"med±mad"="dashed"),
breaks=c('median', "med±mad")
) + scale_color_manual(
values = c(
'#F8766D',
'#00BA38',
'#619CFF')
) + geom_hline(
yintercept = 1,
linetype = 2,
color = "gray"
) + facet_grid(
cols=vars(lead),
rows=vars(sensor_target),
scales='free',
) + ylim (
# 0, 2
0.5, 2.5
) + labs(
x = "Date",
y = "Scaled error (relative to strawman)"
) + theme_bw (
) + theme(
legend.pos = "bottom",
legend.title = element_blank()
)
print(plt)
Our sensorization procedure runs the risk of “overfitting”. Up until now, we have evaluated our sensorized indicator by calculating correlations (or making elementary forecasts) against the very target used in the sensorization step. These evaluation metrics can be trivially maximized by replicating the target in the sensorization, e.g., by taking an extremely short time window.
In the sensorization procedure itself, we guard against overfitting by using a wide time window (9 weeks) and by ignoring the immediately preceding week of data (an air gap between training and evaluation data). In the following subsections, we provide two validation assessments.
Our first validation procedure takes advantage of newly available “ground truth” hospitalization data, provided by the Department of Health & Human Services at the State level.
The idea is as follows: if the sensorized Hospitalizations indicator has been overfit to Cases (or Deaths), then it would observe weakened correlations against this ground truth hospitalizations incidence. (Ideally, we would see improved correlations, as the sensorization should be “improving” the indicator, e.g., by correcting spatial heterogeneity).
We ingest the data through Epidata. Then, we produce four target “indicators” from the data, each on the US State level:
Note that we (1) calculate incidence within each State by normalizing by population and multiplying by 100,000; and (2) shift the time_value of the data by one day, because by default each date reports on the new patients for the previous day.
For two sensorized indicators (Hospitalizations, sensorized separately against Cases and Deaths), we compute the geo-wise correlations against the HHS Hospitalization indicators. We find that:
hosp_cor_df = readRDS('results/15_hhs_cor_df.RDS')
hosp_cor_df = hosp_cor_df %>% filter (
#indicator == 'Hospitalizations',
indicator %in% c('Doctor visits', 'Hospitalizations'),
)
hosp_cor_df$correlate_target = stringr::str_replace(
hosp_cor_df$correlate_target,
': ',
'\n')
plt = ggplot(
hosp_cor_df,
aes(x=time_value,
y=value,
colour=sensorization)
) + geom_line (
) + facet_grid (
rows = vars(correlate_target),
cols = vars(sensor_target),
) + theme_bw (
) + theme(
legend.position = "bottom",
legend.title = element_blank()
) + labs (
x='Date',
y='Rank correlation (geo-wise)'
)
print(plt)
As a second form of validation, we illustrate the coefficients fitted by the dynamic sensorization method, as a function of time. Examining these coefficients serves as a useful sanity check. In fact, the original dynamic sensorization schemes originally considered, which involved short time windows, displayed superb improvements in the geo-wise correlations, but an investigation into the fitted coefficients revealed that those “improvements” were merely the by product of replicating cases in the intercept.
The coefficient distributions leave much to be desired. In the perfect world of our “data model”, the intercepts would be zero (e.g., no COVID-related doctor visits would correspond to no cases); and the slopes would carry the information, perhaps varying over time to account for temporal heterogeneity.
What we see in our data is quite different. The intercepts tend to be away from zero, while the slopes tend to be near zero (often, the median ± mad interval covers zero), despite our purposeful choice of a wide, nine-week wide training window to avoid trivially fitting a moving average.
An additional cause for concern is the phenomenon observed from mid-August to mid-September (shaded in blue), in which the intercept values rise and the slope values fall, as though information is “shifting” from the slope terms into the intercept terms. This phenomenon is especially pronounced in the Doctor visits, Facebook %CLI, and Hospitalizations signals when sensorized against Cases.
Independent of the “meaning” of this simultaneous increase in the intercepts and decrease in the slopes, it is especially concerning that it happens for (nearly) all the indicators sensorized against Cases, at the same time. This suggests that the shifts are due to a change in the behavior of Cases, as a simultaneous change in the behavior of the indicators is unlikely. We should be especially mindful of this, i.e., that the “heterogeneity” being picked up by sensorization can be instability in the target data.
sensor_target_names = sapply(1:5, function(ind_idx) {
sprintf('%s // Sensorization target: %s', pretty_names[ind_idx], target_names[ind_idx])
})
df_coef_list = vector('list', length(source_names))
for (ind_idx in 1:length(source_names)) {
sensorize_val_fname = sprintf('results/12_sensorize_vals_%s_%s_%s_%s.RDS',
geo_level,
source_names[ind_idx], signal_names[ind_idx],
target_names[ind_idx])
sensorize_vals = readRDS(sensorize_val_fname)[[splot_idx]]
df_coef = sensorize_vals %>% select(
geo_value,
time_value,
intercept,
slope,
)
df_coef$sensor_target = sensor_target_names[ind_idx]
df_coef_list[[ind_idx]] = df_coef
}
coef_df = bind_rows(df_coef_list) %>% pivot_longer (
cols = c(slope, intercept),
names_to = 'coefficient',
values_to = 'value',
)
coef_df_summarized = coef_df %>% group_by (
time_value,
sensor_target,
coefficient,
) %>% summarize (
med = median(value, na.rm=TRUE),
mad = mad(value, na.rm=TRUE),
) %>% ungroup
make_coef_plot = function(ggp) {
ggp + geom_rect(
aes(xmin = lubridate::ymd('2020-08-15'),
xmax = lubridate::ymd('2020-09-15'),
ymin = -Inf,
ymax = Inf),
alpha = 0.03,
fill='#E0FFFF',
) + geom_hline(
yintercept = 0,
linetype = 2,
color = "gray"
) + geom_line (
aes(y=med,
linetype='median'),
) + geom_line (
aes(y=med+mad,
linetype='med±mad'),
) + geom_line (
aes(y=med-mad,
linetype='med±mad'),
) + scale_linetype_manual(
values=c("median"="solid",
"med±mad"="dashed",
"min/max"="dotted"),
breaks=c('median',
'med±mad',
'min/max'),
) + facet_wrap (
vars(sensor_target),
ncol=1,
scale='free_y',
) + theme_bw (
) + theme(
axis.title.x=element_blank(),
legend.position = "bottom",
legend.title = element_blank()
)
}
plt_intercept = ggplot(
coef_df_summarized %>% filter (coefficient=='intercept'),
aes(x=time_value),
) %>% make_coef_plot (
) + labs (
y = 'Intercepts'
)
plt_slope = ggplot(
coef_df_summarized %>% filter (coefficient=='slope'),
aes(x=time_value),
) %>% make_coef_plot (
) + labs (
y = 'Slopes'
)
gridExtra::grid.arrange(plt_intercept, plt_slope, ncol=2)
Sensorization, both dynamic and static, has been found to improve the geo-wise correlations of the sensorized indicator against its intended target. Changes in time-wise correlations and ability to forecast future are neutral. Validation setups suggest the spatial correction does work, with some small areas for concern. Future directions include uncertainty estimates for the geo-wise correlations.